39 research outputs found

    Q-CapsNets: A Specialized Framework for Quantizing Capsule Networks

    Get PDF
    Capsule Networks (CapsNets), recently proposed by the Google Brain team, have superior learning capabilities in machine learning tasks, like image classification, compared to the traditional CNNs. However, CapsNets require extremely intense computations and are difficult to be deployed in their original form at the resource-constrained edge devices. This paper makes the first attempt to quantize CapsNet models, to enable their efficient edge implementations, by developing a specialized quantization framework for CapsNets. We evaluate our framework for several benchmarks. On a deep CapsNet model for the CIFAR10 dataset, the framework reduces the memory footprint by 6.2x, with only 0.15% accuracy loss. We will open-source our framework at https://git.io/JvDIF in August 2020.Comment: Accepted for publication at Design Automation Conference 2020 (DAC 2020

    RoHNAS: A Neural Architecture Search Framework with Conjoint Optimization for Adversarial Robustness and Hardware Efficiency of Convolutional and Capsule Networks

    Get PDF
    Neural Architecture Search (NAS) algorithms aim at finding efficient Deep Neural Network (DNN) architectures for a given application under given system constraints. DNNs are computationally-complex as well as vulnerable to adversarial attacks. In order to address multiple design objectives, we propose RoHNAS , a novel NAS framework that jointly optimizes for adversarial-robustness and hardware-efficiency of DNNs executed on specialized hardware accelerators. Besides the traditional convolutional DNNs, RoHNAS additionally accounts for complex types of DNNs such as Capsule Networks. For reducing the exploration time, RoHNAS analyzes and selects appropriate values of adversarial perturbation for each dataset to employ in the NAS flow. Extensive evaluations on multi - Graphics Processing Unit (GPU) - High Performance Computing (HPC) nodes provide a set of Pareto-optimal solutions, leveraging the tradeoff between the above-discussed design objectives. For example, a Pareto-optimal DNN for the CIFAR-10 dataset exhibits 86.07% accuracy, while having an energy of 38.63 mJ, a memory footprint of 11.85 MiB, and a latency of 4.47 ms

    FasTrCaps: An Integrated Framework for Fast yet Accurate Training of Capsule Networks

    Get PDF
    Recently, Capsule Networks (CapsNets) have shown improved performance compared to the traditional Convolutional Neural Networks (CNNs), by encoding and preserving spatial relationships between the detected features in a better way. This is achieved through the so-called Capsules (i.e., groups of neurons) that encode both the instantiation probability and the spatial information. However, one of the major hurdles in the wide adoption of CapsNets is their gigantic training time, which is primarily due to the relatively higher complexity of their new constituting elements that are different from CNNs.In this paper, we implement different optimizations in the training loop of the CapsNets, and investigate how these optimizations affect their training speed and the accuracy. Towards this, we propose a novel framework FasTrCaps that integrates multiple lightweight optimizations and a novel learning rate policy called WarmAdaBatch (that jointly performs warm restarts and adaptive batch size), and steers them in an appropriate way to provide high training-loop speedup at minimal accuracy loss. We also propose weight sharing for capsule layers. The goal is to reduce the hardware requirements of CapsNets by removing unused/redundant connections and capsules, while keeping high accuracy through tests of different learning rate policies and batch sizes. We demonstrate that one of the solutions generated by the FasTrCaps framework can achieve 58.6% reduction in the training time, while preserving the accuracy (even 0.12% accuracy improvement for the MNIST dataset), compared to the CapsNet by Google Brain [25]. Moreover, the Pareto-optimal solutions generated by FasTrCaps can be leveraged to realize trade-offs between training time and achieved accuracy. We have open-sourced our framework on GitHub 1

    Oncostatin M is overexpressed in NASH-related hepatocellular carcinoma and promotes cancer cell invasiveness and angiogenesis

    Get PDF
    : Oncostatin M (OSM) is a pleiotropic cytokine of the interleukin (IL)-6 family that contributes to the progression of chronic liver disease. Here we investigated the role of OSM in the development and progression of hepatocellular carcinoma (HCC) in NAFLD/NASH. The role of OSM was investigated in: a) selected cohorts of NAFLD/NASH HCC patients; b) liver cancer cells exposed to human recombinant OSM or stably transfected to overexpress human OSM; c) murine HCC xenografts; d) a murine NASH-related model of hepatic carcinogenesis. OSM was found to be selectively overexpressed in HCC cells of NAFLD/NASH patients, depending on tumor grade. OSM serum levels, barely detectable in patients with simple steatosis or NASH, were increased in patients with cirrhosis, and more evident in those carrying HCC. In this latter group, OSM serum levels were significantly higher in the subjects with intermediate/advanced HCCs and correlated with poor survival. Cell culture experiments indicated that OSM upregulation in hepatic cancer cells contributes to HCC progression by inducing epithelial-to-mesenchymal transition and increased invasiveness of cancer cells as well as by inducing angiogenesis, which is of critical relevance. In murine xenografts, OSM overexpression was associated with slower tumor growth, but an increased rate of lung metastases. Overexpression of OSM and its positive correlation with the angiogenic switch were also confirmed in a murine model of NAFLD/NASH-related hepatocarcinogenesis. Consistent with this, analysis of liver specimens from human NASH-related HCCs with vascular invasion showed that OSM was expressed by liver cancer cells invading hepatic vessels. In conclusion, OSM up-regulation appears to be a specific feature of HCC arising on a NAFLD/NASH background, and it correlates with clinical parameters and disease outcome. Our data highlight a novel pro-carcinogenic contribution for OSM in NAFLD/NASH, suggesting a role of this factor as a prognostic marker and a putative potential target for therapy. This article is protected by copyright. All rights reserved

    Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile

    No full text
    Most of today's computer vision pipelines are built around deep neural networks, where convolution operations require most of the generally high compute effort. The Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm, reducing the operation count by a factor of 2.25x for 3x3 convolutions when using the version with 2x2-sized tiles F2. Even though the gain is significant, the Winograd algorithm with larger tile sizes, i.e., F4, offers even more potential in improving throughput and energy efficiency, as it reduces the required MACs by 4x. Unfortunately, the Winograd algorithm with larger tile sizes introduces numerical issues that prevent its use on integer domain-specific accelerators and higher computational overhead to transform input and output data between spatial and Winograd domains. To unlock the full potential of Winograd F4, we propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles, enabling integer-only inference. Moreover, we present custom hardware units that process the Winograd transformations in a power- and area-efficient way, and we show how to integrate such custom modules in an industrial-grade, programmable DSA. An extensive experimental evaluation on a large set of state-of-the-art computer vision benchmarks reveals that the tap-wise quantization algorithm makes the quantized Winograd F4 network almost as accurate as the FP32 baseline. The Winograd-enhanced DSA achieves up to 1.85x gain in energy efficiency and up to 1.83x end-to-end speed-up for state-of-the-art segmentation and detection networks

    Enabling Capsule Networks at the Edge through Approximate Softmax and Squash Operations

    No full text
    Complex Deep Neural Networks such as Capsule Networks (CapsNets) exhibit high learning capabilities at the cost of compute-intensive operations. To enable their deployment on edge devices, we propose to leverage approximate computing for designing approximate variants of the complex operations like softmax and squash. In our experiments, we evaluate tradeoffs between area, power consumption, and critical path delay of the designs implemented with the ASIC design flow, and the accuracy of the quantized CapsNets, compared to the exact functions

    NLCMAP: A Framework for the Efficient Mapping of Non-Linear Convolutional Neural Networks on FPGA Accelerators

    No full text
    This paper introduces NLCMap, a framework for the mapping space exploration targeting Non-Linear Convolutional Networks (NLCNs). NLCNs [1] are a novel neural network model that improves performances in certain computer vision applications by introducing a non-linearity in the weights computation. NLCNs are more challenging to efficiently map onto hardware accelerators if compared to traditional Convolutional Neural Networks (CNNs), due to data dependencies and additional computations. To this aim, we propose NLCMap, a framework that, given an NLC layer and a generic hardware accelerator with a certain on-chip memory budget, finds the optimal mapping that minimizes the accesses to the off-chip memory, which are often the critical aspect in CNNs acceleration
    corecore